{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 《数据分析工具箱：Python篇》课后习题"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第2节课 数据类型与Array产生"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.随机产生一个3维（4,3,5）的数组。数值为0到1之间。 seed设为0\n",
    "\n",
    "### 2.产生一个2维（3，5）的等差数组，数字为0到14"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[[0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ],\n",
       "        [0.64589411, 0.43758721, 0.891773  , 0.96366276, 0.38344152],\n",
       "        [0.79172504, 0.52889492, 0.56804456, 0.92559664, 0.07103606]],\n",
       "\n",
       "       [[0.0871293 , 0.0202184 , 0.83261985, 0.77815675, 0.87001215],\n",
       "        [0.97861834, 0.79915856, 0.46147936, 0.78052918, 0.11827443],\n",
       "        [0.63992102, 0.14335329, 0.94466892, 0.52184832, 0.41466194]],\n",
       "\n",
       "       [[0.26455561, 0.77423369, 0.45615033, 0.56843395, 0.0187898 ],\n",
       "        [0.6176355 , 0.61209572, 0.616934  , 0.94374808, 0.6818203 ],\n",
       "        [0.3595079 , 0.43703195, 0.6976312 , 0.06022547, 0.66676672]],\n",
       "\n",
       "       [[0.67063787, 0.21038256, 0.1289263 , 0.31542835, 0.36371077],\n",
       "        [0.57019677, 0.43860151, 0.98837384, 0.10204481, 0.20887676],\n",
       "        [0.16130952, 0.65310833, 0.2532916 , 0.46631077, 0.24442559]]])"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.random.seed(0)\n",
    "np.random.random((4,3,5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3,  4],\n",
       "       [ 5,  6,  7,  8,  9],\n",
       "       [10, 11, 12, 13, 14]])"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.arange(15).reshape(3,5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第二节课 数据选择 Numpy计算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3,  4],\n",
       "       [ 5,  6,  7,  8,  9],\n",
       "       [10, 11, 12, 13, 14]])"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a=np.arange(15).reshape(3,5)\n",
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.选择第一行第二列的数据\n",
    "\n",
    "### 2.选择第三列的数据\n",
    "\n",
    "### 3.选择所有在第二第三行，第四第五列的数据\n",
    "\n",
    "### 4.第一行每个数据加一，第二行每个数据加二，其他数据不变\n",
    "\n",
    "### 5.第一列每个数据 $\\times$ 2，第二列每个数据 $\\times$ 3，其他数据不变"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a[0,1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 2,  7, 12])"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a[:,2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 8,  9],\n",
       "       [13, 14]])"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a[1:,3:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "tags": [
     "4."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1,  2,  3,  4,  5],\n",
       "       [ 7,  8,  9, 10, 11],\n",
       "       [10, 11, 12, 13, 14]])"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a+np.array([[1],[2],[0]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {
    "tags": [
     "5."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  3,  2,  3,  4],\n",
       "       [10, 18,  7,  8,  9],\n",
       "       [20, 33, 12, 13, 14]])"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a*np.array([2,3,1,1,1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第三节课 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3],\n",
       "       [ 4,  5,  6,  7],\n",
       "       [ 8,  9, 10, 11]])"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a=np.arange(12).reshape(3,4)  \n",
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.深层拷贝a，命名为b\n",
    "\n",
    "### 2.把b的第三行，第四列的数据变成8\n",
    "\n",
    "### 3.把a按行展开\n",
    "\n",
    "### 4.把a按列展开"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3],\n",
       "       [ 4,  5,  6,  7],\n",
       "       [ 8,  9, 10, 11]])"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b=a.copy()\n",
    "b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3],\n",
       "       [ 4,  5,  6,  7],\n",
       "       [ 8,  9, 10,  8]])"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b[2,3]=8\n",
    "b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.ravel()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "tags": [
     "4"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0,  4,  8,  1,  5,  9,  2,  6, 10,  3,  7, 11])"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.ravel('F')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第三节课-2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3],\n",
       "       [ 4,  5,  6,  7],\n",
       "       [ 8,  9, 10, 11]])"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a=np.arange(12).reshape(3,4)  \n",
    "a"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.计算a每行的平均值\n",
    "\n",
    "### 2.寻找每列最大值并返回index\n",
    "\n",
    "### 3.对每行排序，从大到小，命名为b\n",
    "\n",
    "### 4.保存b为exercise.npy的文件"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1.5, 5.5, 9.5])"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.mean(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([2, 2, 2, 2], dtype=int64)"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.argmax(axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 3,  2,  1,  0],\n",
       "       [ 7,  6,  5,  4],\n",
       "       [11, 10,  9,  8]])"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b=-np.sort(-a)\n",
    "b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "4."
    ]
   },
   "outputs": [],
   "source": [
    "np.save('exercise.npy',b)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第四节课 Pandas Series创建，性质，Index排序"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.用两种方法创建一个学生成绩的series，student a 86分，student b 90分，student c 75分，student d 100分\n",
    "\n",
    "### 2.产生包括所有学生的list\n",
    "\n",
    "### 3.产生所有分数的list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student a     86\n",
       "student b     90\n",
       "student c     75\n",
       "student d    100\n",
       "dtype: int64"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score=pd.Series([86,90,75,100],index=['student a','student b','student c','student d'])\n",
    "score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student a     86\n",
       "student b     90\n",
       "student c     75\n",
       "student d    100\n",
       "dtype: int64"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series({'student a':86,'student b':90,'student c':75,'student d':100})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['student a', 'student b', 'student c', 'student d']"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score.index.tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[86, 90, 75, 100]"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score.values.tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.选出student c的成绩\n",
    "\n",
    "### 5.选出分数高于85分的学生"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "tags": [
     "4."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "75"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score['student c']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['student a', 'student b', 'student d']"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score[score>85].index.tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 第四节课 Pandas 计算 缺失数据处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.创建一个学生数学成绩的series，student a 86分，student b 90分，student c 75分，student d 100分\n",
    "\n",
    "### 2.创建一个学生语文成绩的series，student a 100分，student b 60分，student c 85分，student d 98分\n",
    "\n",
    "### 3.计算每个学生的平均分\n",
    "\n",
    "### 4.计算每个学生数学和语文的分数差绝对值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student a     86\n",
       "student b     90\n",
       "student c     75\n",
       "student d    100\n",
       "dtype: int64"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "math=pd.Series([86,90,75,100],index=['student a','student b','student c','student d'])\n",
    "math"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student a    100\n",
       "student b     60\n",
       "student c     85\n",
       "student d     98\n",
       "dtype: int64"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chinese=pd.Series([100,60,85,98],index=['student a','student b','student c','student d'])\n",
    "chinese"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student a   93.0\n",
       "student b   75.0\n",
       "student c   80.0\n",
       "student d   99.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(math+chinese)/2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {
    "tags": [
     "4."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "student a    14\n",
       "student b    30\n",
       "student c    10\n",
       "student d     2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "abs(math-chinese)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第五节课 Dataframe创建，选择"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({\n",
    "    'ticker': ['AAPL', 'AAPL', 'MSFT', 'IBM', 'YHOO'],\n",
    "    'date': ['2015-12-30', '2015-12-31', '2015-12-30', '2015-12-30', '2015-12-30'],\n",
    "    'open': [426.23, 427.81, 42.3, 101.65, 35.53]\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.用两种方法选择open，结果是一个series\n",
    "\n",
    "### 2.选择open，结果是一个dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   426.2\n",
       "1   427.8\n",
       "2    42.3\n",
       "3   101.7\n",
       "4    35.5\n",
       "Name: open, dtype: float64"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['open']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   426.2\n",
       "1   427.8\n",
       "2    42.3\n",
       "3   101.7\n",
       "4    35.5\n",
       "Name: open, dtype: float64"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[:,2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>open</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>426.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>427.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>42.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>35.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   open\n",
       "0 426.2\n",
       "1 427.8\n",
       "2  42.3\n",
       "3 101.7\n",
       "4  35.5"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['open']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第五节课 多维选择条件选择安全选择，不安全选择，index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({\n",
    "    'ticker': ['AAPL', 'AAPL', 'MSFT', 'IBM', 'YHOO'],\n",
    "    'date': ['2015-12-30', '2015-12-31', '2015-12-30', '2015-12-30', '2015-12-30'],\n",
    "    'open': [426.23, 427.81, 42.3, 101.65, 35.53]\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.对所有ticker是AAPL的行，选择date和open\n",
    "\n",
    "### 2.通过df新建一个df1，df1以ticker作为index\n",
    "\n",
    "### 3.通过df1新建一个df2，以date作为index\n",
    "\n",
    "### 4.通过index 排序df2，递增排序\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>open</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>426.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>427.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date  open\n",
       "0  2015-12-30 426.2\n",
       "1  2015-12-31 427.8"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['ticker']=='AAPL'][['date','open']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>open</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ticker</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AAPL</th>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>426.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AAPL</th>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>427.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MSFT</th>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>42.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>IBM</th>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>YHOO</th>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>35.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              date  open\n",
       "ticker                  \n",
       "AAPL    2015-12-30 426.2\n",
       "AAPL    2015-12-31 427.8\n",
       "MSFT    2015-12-30  42.3\n",
       "IBM     2015-12-30 101.7\n",
       "YHOO    2015-12-30  35.5"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1=df.set_index('ticker')\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>open</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2015-12-30</th>\n",
       "      <td>426.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-12-31</th>\n",
       "      <td>427.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-12-30</th>\n",
       "      <td>42.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-12-30</th>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-12-30</th>\n",
       "      <td>35.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            open\n",
       "date            \n",
       "2015-12-30 426.2\n",
       "2015-12-31 427.8\n",
       "2015-12-30  42.3\n",
       "2015-12-30 101.7\n",
       "2015-12-30  35.5"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2=df1.set_index('date')\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {
    "tags": [
     "4."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>open</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2015-12-30</th>\n",
       "      <td>426.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-12-30</th>\n",
       "      <td>42.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-12-30</th>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-12-30</th>\n",
       "      <td>35.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-12-31</th>\n",
       "      <td>427.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            open\n",
       "date            \n",
       "2015-12-30 426.2\n",
       "2015-12-30  42.3\n",
       "2015-12-30 101.7\n",
       "2015-12-30  35.5\n",
       "2015-12-31 427.8"
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第五节课 dataframe修改，错误修改，保存"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({\n",
    "    'ticker': ['AAPL', 'AAPL', 'MSFT', 'IBM', 'YHOO'],\n",
    "    'date': ['2015-12-30', '2015-12-31', '2015-12-30', '2015-12-30', '2015-12-30'],\n",
    "    'open': [426.23, 427.81, 42.3, 101.65, 35.53]\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.复制一个df，命名为df3。 新增一个close column，对open值大于100的行数，open等于close，其余行close是nan\n",
    "\n",
    "### 2.对df3基于close排序，递减排序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [],
   "source": [
    "df3=df.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>date</th>\n",
       "      <th>open</th>\n",
       "      <th>close</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>426.2</td>\n",
       "      <td>426.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>427.8</td>\n",
       "      <td>427.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>42.3</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>IBM</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>101.7</td>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>YHOO</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>35.5</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker        date  open  close\n",
       "0   AAPL  2015-12-30 426.2  426.2\n",
       "1   AAPL  2015-12-31 427.8  427.8\n",
       "2   MSFT  2015-12-30  42.3    nan\n",
       "3    IBM  2015-12-30 101.7  101.7\n",
       "4   YHOO  2015-12-30  35.5    nan"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3.loc[df3['open']>100,'close']=df3['open']\n",
    "df3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>date</th>\n",
       "      <th>open</th>\n",
       "      <th>close</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>427.8</td>\n",
       "      <td>427.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>426.2</td>\n",
       "      <td>426.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>IBM</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>101.7</td>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>42.3</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>YHOO</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>35.5</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker        date  open  close\n",
       "1   AAPL  2015-12-31 427.8  427.8\n",
       "0   AAPL  2015-12-30 426.2  426.2\n",
       "3    IBM  2015-12-30 101.7  101.7\n",
       "2   MSFT  2015-12-30  42.3    nan\n",
       "4   YHOO  2015-12-30  35.5    nan"
      ]
     },
     "execution_count": 127,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3.sort_values(['close'],ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第六节课 Dataframe性质"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.创建一个学生成绩的dataframe，数学成绩 student a 86分，student b 90分，student c 75分，student d 100分，语文成绩 student a 100分，student b 60分，student c 85分，student d 98分\n",
    "\n",
    "### 2.列出score的column,index和value\n",
    "\n",
    "### 3.计算数学成绩和语文成绩的分布"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>math</th>\n",
       "      <th>chinese</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>student a</th>\n",
       "      <td>86</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student b</th>\n",
       "      <td>90</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student c</th>\n",
       "      <td>75</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>student d</th>\n",
       "      <td>100</td>\n",
       "      <td>98</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           math  chinese\n",
       "student a    86      100\n",
       "student b    90       60\n",
       "student c    75       85\n",
       "student d   100       98"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score=pd.DataFrame({'math':[86,90,75,100],\n",
    "             'chinese':[100,60,85,98]},index=['student a','student b','student c','student d'])\n",
    "\n",
    "score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['math', 'chinese'], dtype='object')"
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['student a', 'student b', 'student c', 'student d'], dtype='object')"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 86, 100],\n",
       "       [ 90,  60],\n",
       "       [ 75,  85],\n",
       "       [100,  98]], dtype=int64)"
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score.values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>math</th>\n",
       "      <th>chinese</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>87.8</td>\n",
       "      <td>85.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>10.3</td>\n",
       "      <td>18.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>75.0</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>83.2</td>\n",
       "      <td>78.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>88.0</td>\n",
       "      <td>91.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>92.5</td>\n",
       "      <td>98.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>100.0</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       math  chinese\n",
       "count   4.0      4.0\n",
       "mean   87.8     85.8\n",
       "std    10.3     18.4\n",
       "min    75.0     60.0\n",
       "25%    83.2     78.8\n",
       "50%    88.0     91.5\n",
       "75%    92.5     98.5\n",
       "max   100.0    100.0"
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "score.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第六节课 copy，缺失数据处理：fillna，dropna"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({\n",
    "   'ticker': ['AAPL', 'AAPL', 'AAPL','MSFT', 'IBM', 'YHOO'],\n",
    "    'date': ['2015-12-30', '2015-12-31', '2015-12-30', '2015-12-29','2015-12-31', '2015-12-31'],\n",
    "    'open': [426.23, 500.51, 42.3, 101.65, 35.53,np.nan]})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.复制一个df，命名为df2\n",
    "\n",
    "### 2.向前填充缺失值\n",
    "\n",
    "### 3.删除包含缺失值的row\n",
    "\n",
    "### 4.删除包含缺失值的column\n",
    "\n",
    "### 5.对每个ticker，只保留最近一条记录\n",
    "\n",
    "### 6.删除date\n",
    "\n",
    "### 7.删除第一行记录"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [],
   "source": [
    "df2=df.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 153,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>date</th>\n",
       "      <th>open</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>426.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>500.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>42.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2015-12-29</td>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>IBM</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>35.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>YHOO</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>35.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker        date  open\n",
       "0   AAPL  2015-12-30 426.2\n",
       "1   AAPL  2015-12-31 500.5\n",
       "2   AAPL  2015-12-30  42.3\n",
       "3   MSFT  2015-12-29 101.7\n",
       "4    IBM  2015-12-31  35.5\n",
       "5   YHOO  2015-12-31  35.5"
      ]
     },
     "execution_count": 153,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.ffill()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>date</th>\n",
       "      <th>open</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>426.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>500.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>42.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2015-12-29</td>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>IBM</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>35.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker        date  open\n",
       "0   AAPL  2015-12-30 426.2\n",
       "1   AAPL  2015-12-31 500.5\n",
       "2   AAPL  2015-12-30  42.3\n",
       "3   MSFT  2015-12-29 101.7\n",
       "4    IBM  2015-12-31  35.5"
      ]
     },
     "execution_count": 154,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {
    "tags": [
     "4."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2015-12-29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>IBM</td>\n",
       "      <td>2015-12-31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>YHOO</td>\n",
       "      <td>2015-12-31</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker        date\n",
       "0   AAPL  2015-12-30\n",
       "1   AAPL  2015-12-31\n",
       "2   AAPL  2015-12-30\n",
       "3   MSFT  2015-12-29\n",
       "4    IBM  2015-12-31\n",
       "5   YHOO  2015-12-31"
      ]
     },
     "execution_count": 155,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.dropna(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 156,
   "metadata": {
    "tags": [
     "5."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>date</th>\n",
       "      <th>open</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>500.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>IBM</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>35.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2015-12-29</td>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>YHOO</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker        date  open\n",
       "1   AAPL  2015-12-31 500.5\n",
       "4    IBM  2015-12-31  35.5\n",
       "3   MSFT  2015-12-29 101.7\n",
       "5   YHOO  2015-12-31   nan"
      ]
     },
     "execution_count": 156,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.sort_values(['ticker','date']).drop_duplicates(subset=['ticker'],keep='last')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 158,
   "metadata": {
    "tags": [
     "6."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>open</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>426.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>500.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>42.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>IBM</td>\n",
       "      <td>35.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>YHOO</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker  open\n",
       "0   AAPL 426.2\n",
       "1   AAPL 500.5\n",
       "2   AAPL  42.3\n",
       "3   MSFT 101.7\n",
       "4    IBM  35.5\n",
       "5   YHOO   nan"
      ]
     },
     "execution_count": 158,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.drop(['date'],axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {
    "tags": [
     "7."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>date</th>\n",
       "      <th>open</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>500.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2015-12-30</td>\n",
       "      <td>42.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2015-12-29</td>\n",
       "      <td>101.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>IBM</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>35.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>YHOO</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker        date  open\n",
       "1   AAPL  2015-12-31 500.5\n",
       "2   AAPL  2015-12-30  42.3\n",
       "3   MSFT  2015-12-29 101.7\n",
       "4    IBM  2015-12-31  35.5\n",
       "5   YHOO  2015-12-31   nan"
      ]
     },
     "execution_count": 160,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.drop(0,axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第七节课"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "pd.set_option('display.float_format', '{:,.1f}'.format)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "dat = pd.read_csv('./data/weather-6m.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.计算每个月 `air_temp` 的平均值\n",
    "\n",
    "### 2.用 describe 描述 air_temp 和 dew_point\n",
    "\n",
    "### 3.只对1月和2月, 计算每小时air_temp 的平均值和std\n",
    "\n",
    "### 4.计算每月air_temp 的分位数[0.2,0.4,0.6,0.8]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>air_temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-100.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-29.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>70.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>139.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>180.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       air_temp\n",
       "month          \n",
       "1        -100.1\n",
       "2         -29.5\n",
       "3          21.0\n",
       "4          70.3\n",
       "5         139.8\n",
       "6         180.6"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dat.groupby(['month'])[['air_temp']].mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count   4,161.0\n",
       "mean       47.4\n",
       "std       114.0\n",
       "min      -294.0\n",
       "25%       -33.0\n",
       "50%        44.0\n",
       "75%       133.0\n",
       "max       333.0\n",
       "Name: air_temp, dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dat['air_temp'].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count   4,161.0\n",
       "mean       -9.8\n",
       "std       105.9\n",
       "min      -328.0\n",
       "25%       -83.0\n",
       "50%       -11.0\n",
       "75%        72.0\n",
       "max       233.0\n",
       "Name: dew_point, dtype: float64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dat['dew_point'].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>hour</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-57.0</td>\n",
       "      <td>70.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-63.7</td>\n",
       "      <td>71.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-68.7</td>\n",
       "      <td>71.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-71.6</td>\n",
       "      <td>72.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-75.4</td>\n",
       "      <td>73.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>-78.2</td>\n",
       "      <td>70.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>nan</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>-80.9</td>\n",
       "      <td>75.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>-81.8</td>\n",
       "      <td>75.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>-84.2</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>-86.9</td>\n",
       "      <td>76.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>-87.7</td>\n",
       "      <td>76.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>-87.4</td>\n",
       "      <td>76.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>-87.8</td>\n",
       "      <td>76.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>-82.9</td>\n",
       "      <td>76.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>-72.4</td>\n",
       "      <td>74.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>-63.9</td>\n",
       "      <td>72.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>-53.7</td>\n",
       "      <td>68.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>-45.2</td>\n",
       "      <td>66.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>-40.0</td>\n",
       "      <td>66.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>-37.1</td>\n",
       "      <td>66.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>-37.3</td>\n",
       "      <td>66.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>-39.7</td>\n",
       "      <td>66.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>-47.3</td>\n",
       "      <td>67.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      mean  std\n",
       "hour           \n",
       "0    -57.0 70.7\n",
       "1    -63.7 71.0\n",
       "2    -68.7 71.3\n",
       "3    -71.6 72.8\n",
       "4    -75.4 73.4\n",
       "5    -78.2 70.5\n",
       "6      nan  nan\n",
       "7    -80.9 75.4\n",
       "8    -81.8 75.0\n",
       "9    -84.2 75.5\n",
       "10   -86.9 76.2\n",
       "11   -87.7 76.9\n",
       "12   -87.4 76.2\n",
       "13   -87.8 76.9\n",
       "14   -82.9 76.1\n",
       "15   -72.4 74.1\n",
       "16   -63.9 72.0\n",
       "17   -53.7 68.7\n",
       "18   -45.2 66.4\n",
       "19   -40.0 66.6\n",
       "20   -37.1 66.5\n",
       "21   -37.3 66.7\n",
       "22   -39.7 66.7\n",
       "23   -47.3 67.7"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dat[dat['month']<=2].groupby(['hour'])['air_temp'].agg(['mean','std'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "tags": [
     "4."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "month     \n",
       "1      0.2   -144.0\n",
       "       0.4   -117.0\n",
       "       0.6    -83.0\n",
       "       0.8    -39.2\n",
       "2      0.2    -89.0\n",
       "       0.4    -50.0\n",
       "       0.6    -10.2\n",
       "       0.8     22.0\n",
       "3      0.2    -33.0\n",
       "       0.4     11.0\n",
       "       0.6     33.0\n",
       "       0.8     72.0\n",
       "4      0.2     22.0\n",
       "       0.4     44.0\n",
       "       0.6     67.0\n",
       "       0.8    111.0\n",
       "5      0.2    100.0\n",
       "       0.4    128.0\n",
       "       0.6    150.0\n",
       "       0.8    178.0\n",
       "6      0.2    128.0\n",
       "       0.4    161.0\n",
       "       0.6    194.0\n",
       "       0.8    239.0\n",
       "Name: air_temp, dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dat.groupby(['month'])['air_temp'].apply(lambda x:x.quantile([0.2,0.4,0.6,0.8]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>air_temp</th>\n",
       "      <th>0.2</th>\n",
       "      <th>0.4</th>\n",
       "      <th>0.6</th>\n",
       "      <th>0.8</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-144.0</td>\n",
       "      <td>-117.0</td>\n",
       "      <td>-83.0</td>\n",
       "      <td>-39.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-89.0</td>\n",
       "      <td>-50.0</td>\n",
       "      <td>-10.2</td>\n",
       "      <td>22.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-33.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>72.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>22.0</td>\n",
       "      <td>44.0</td>\n",
       "      <td>67.0</td>\n",
       "      <td>111.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>100.0</td>\n",
       "      <td>128.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>178.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>128.0</td>\n",
       "      <td>161.0</td>\n",
       "      <td>194.0</td>\n",
       "      <td>239.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "air_temp    0.2    0.4   0.6   0.8\n",
       "month                             \n",
       "1        -144.0 -117.0 -83.0 -39.2\n",
       "2         -89.0  -50.0 -10.2  22.0\n",
       "3         -33.0   11.0  33.0  72.0\n",
       "4          22.0   44.0  67.0 111.0\n",
       "5         100.0  128.0 150.0 178.0\n",
       "6         128.0  161.0 194.0 239.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dat.groupby(['month']).apply(lambda x:x['air_temp'].quantile([0.2,0.4,0.6,0.8]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第七节课"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "dat = pd.read_csv('./data/weather-6m.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.计算每个air_temp 基于每月平均值的zscore\n",
    "\n",
    "### 2.随机选取50个数据，运用随机产生的数据，对每个月计算，数据量在每天的分布， 比如 10%的数据在第一天采集到\n",
    "\n",
    "### 3.在dat数据基础上，重新生成一个代表每天air_temp平均值的column,命名为daily_air_temp\n",
    "\n",
    "### 4.在dat中去掉每日最大air_temp 小于10的日子"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>air_temp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4335</th>\n",
       "      <td>-0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4336</th>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4337</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4338</th>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4339</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4340 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      air_temp\n",
       "0         -0.5\n",
       "1         -0.5\n",
       "2         -0.4\n",
       "3         -0.3\n",
       "4         -0.2\n",
       "...        ...\n",
       "4335      -0.0\n",
       "4336       0.1\n",
       "4337       0.0\n",
       "4338       0.1\n",
       "4339       0.0\n",
       "\n",
       "[4340 rows x 1 columns]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dat.groupby(['month'])[['air_temp']].apply(lambda x:(x-x.mean())/x.std())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "month  day\n",
       "1      1     33.3\n",
       "       23    33.3\n",
       "       24    33.3\n",
       "2      1     16.7\n",
       "       7     16.7\n",
       "       12    16.7\n",
       "       13    16.7\n",
       "       18    16.7\n",
       "       21    16.7\n",
       "3      1     12.5\n",
       "       22    12.5\n",
       "       23    25.0\n",
       "       26    25.0\n",
       "       30    12.5\n",
       "       31    12.5\n",
       "4      4     10.0\n",
       "       7     20.0\n",
       "       10    10.0\n",
       "       11    10.0\n",
       "       13    10.0\n",
       "       19    10.0\n",
       "       26    10.0\n",
       "       28    10.0\n",
       "       29    10.0\n",
       "5      7     28.6\n",
       "       14    14.3\n",
       "       19    14.3\n",
       "       23    14.3\n",
       "       24    14.3\n",
       "       29    14.3\n",
       "6      1      6.2\n",
       "       3      6.2\n",
       "       4      6.2\n",
       "       6      6.2\n",
       "       10     6.2\n",
       "       14    18.8\n",
       "       15     6.2\n",
       "       18     6.2\n",
       "       24     6.2\n",
       "       25     6.2\n",
       "       26     6.2\n",
       "       27     6.2\n",
       "       28    12.5\n",
       "dtype: float64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dat.sample(50,random_state=1).groupby(['month']).apply(lambda x:x.groupby('day').size()*100/len(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [],
   "source": [
    "dat['daily_air_temp']=dat.groupby(['month','day'])['air_temp'].transform('mean')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "tags": [
     "4."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>day</th>\n",
       "      <th>hour</th>\n",
       "      <th>air_temp</th>\n",
       "      <th>dew_point</th>\n",
       "      <th>wind_speed</th>\n",
       "      <th>daily_air_temp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>71</th>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>-44.0</td>\n",
       "      <td>93.0</td>\n",
       "      <td>1.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72</th>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>11.0</td>\n",
       "      <td>-44.0</td>\n",
       "      <td>88.0</td>\n",
       "      <td>1.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73</th>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>11.0</td>\n",
       "      <td>-39.0</td>\n",
       "      <td>82.0</td>\n",
       "      <td>1.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74</th>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>17.0</td>\n",
       "      <td>-39.0</td>\n",
       "      <td>93.0</td>\n",
       "      <td>1.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>11.0</td>\n",
       "      <td>-33.0</td>\n",
       "      <td>77.0</td>\n",
       "      <td>1.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4335</th>\n",
       "      <td>2009</td>\n",
       "      <td>6</td>\n",
       "      <td>30</td>\n",
       "      <td>19</td>\n",
       "      <td>178.0</td>\n",
       "      <td>111.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>175.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4336</th>\n",
       "      <td>2009</td>\n",
       "      <td>6</td>\n",
       "      <td>30</td>\n",
       "      <td>20</td>\n",
       "      <td>189.0</td>\n",
       "      <td>117.0</td>\n",
       "      <td>46.0</td>\n",
       "      <td>175.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4337</th>\n",
       "      <td>2009</td>\n",
       "      <td>6</td>\n",
       "      <td>30</td>\n",
       "      <td>21</td>\n",
       "      <td>183.0</td>\n",
       "      <td>111.0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>175.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4338</th>\n",
       "      <td>2009</td>\n",
       "      <td>6</td>\n",
       "      <td>30</td>\n",
       "      <td>22</td>\n",
       "      <td>189.0</td>\n",
       "      <td>111.0</td>\n",
       "      <td>46.0</td>\n",
       "      <td>175.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4339</th>\n",
       "      <td>2009</td>\n",
       "      <td>6</td>\n",
       "      <td>30</td>\n",
       "      <td>23</td>\n",
       "      <td>183.0</td>\n",
       "      <td>117.0</td>\n",
       "      <td>46.0</td>\n",
       "      <td>175.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3237 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      year  month  day  hour  air_temp  dew_point  wind_speed  daily_air_temp\n",
       "71    2009      1    4     0      11.0      -44.0        93.0             1.7\n",
       "72    2009      1    4     1      11.0      -44.0        88.0             1.7\n",
       "73    2009      1    4     2      11.0      -39.0        82.0             1.7\n",
       "74    2009      1    4     3      17.0      -39.0        93.0             1.7\n",
       "75    2009      1    4     4      11.0      -33.0        77.0             1.7\n",
       "...    ...    ...  ...   ...       ...        ...         ...             ...\n",
       "4335  2009      6   30    19     178.0      111.0        41.0           175.1\n",
       "4336  2009      6   30    20     189.0      117.0        46.0           175.1\n",
       "4337  2009      6   30    21     183.0      111.0        36.0           175.1\n",
       "4338  2009      6   30    22     189.0      111.0        46.0           175.1\n",
       "4339  2009      6   30    23     183.0      117.0        46.0           175.1\n",
       "\n",
       "[3237 rows x 8 columns]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dat.groupby(['month','day']).filter(lambda x:x['air_temp'].max()>=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 第八节课 Merge，join，combine。创建sampledata，inner/left/right/outer/full Join。多列join"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 162,
   "metadata": {},
   "outputs": [],
   "source": [
    "import io\n",
    "prc = pd.read_csv(\n",
    "    io.StringIO('ticker,open,date,close\\nAAPL,426.23,2018-01-04,435.23\\nMSFT,42.3,2018-01-04,51.3\\nAAPL,436.23,2018-01-05,\\nMSFT,52.3,2018-01-05,\\n'),\n",
    "    parse_dates=['date']\n",
    ")\n",
    "prc2 = prc.assign(\n",
    "    date=pd.to_datetime('2018-01-06'),\n",
    "    close=prc.open + np.random.randn(len(prc.open))\n",
    ").drop('open', axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "metadata": {},
   "outputs": [],
   "source": [
    "volume = pd.DataFrame({\n",
    "    'ticker': ['AAPL', 'MSFT', 'IBM', 'YHOO', 'GOOG'],\n",
    "    'volume': [1954.73,  335.83,  362.79,  858.18,  629.79]\n",
    "}).assign(date=pd.to_datetime('2018-01-05'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**通过`ticker, date` merge `prc` 和 `volume`:**\n",
    "\n",
    "### 1.保留在 prc 和 volume 中都存在的 `ticker` 和 `date`\n",
    "\n",
    "### 2.保留 `prc` 中的所有行\n",
    "\n",
    "### 3.保留所有出现在 `prc` 和 `volume` 中的行\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>open</th>\n",
       "      <th>date</th>\n",
       "      <th>close</th>\n",
       "      <th>volume</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>436.2</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "      <td>1,954.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>52.3</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "      <td>335.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker  open       date  close  volume\n",
       "0   AAPL 436.2 2018-01-05    nan 1,954.7\n",
       "1   MSFT  52.3 2018-01-05    nan   335.8"
      ]
     },
     "execution_count": 165,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cols = ['ticker', 'date']\n",
    "pd.merge(prc, volume, on=cols)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 166,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>open</th>\n",
       "      <th>date</th>\n",
       "      <th>close</th>\n",
       "      <th>volume</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>426.2</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>435.2</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>42.3</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>51.3</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>436.2</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "      <td>1,954.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>52.3</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "      <td>335.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker  open       date  close  volume\n",
       "0   AAPL 426.2 2018-01-04  435.2     nan\n",
       "1   MSFT  42.3 2018-01-04   51.3     nan\n",
       "2   AAPL 436.2 2018-01-05    nan 1,954.7\n",
       "3   MSFT  52.3 2018-01-05    nan   335.8"
      ]
     },
     "execution_count": 166,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(prc, volume, on=cols, how='left')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>open</th>\n",
       "      <th>date</th>\n",
       "      <th>close</th>\n",
       "      <th>volume</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>426.2</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>435.2</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>42.3</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>51.3</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>436.2</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "      <td>1,954.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>52.3</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "      <td>335.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>IBM</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "      <td>362.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>YHOO</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "      <td>858.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>GOOG</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "      <td>629.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker  open       date  close  volume\n",
       "0   AAPL 426.2 2018-01-04  435.2     nan\n",
       "1   MSFT  42.3 2018-01-04   51.3     nan\n",
       "2   AAPL 436.2 2018-01-05    nan 1,954.7\n",
       "3   MSFT  52.3 2018-01-05    nan   335.8\n",
       "4    IBM   nan 2018-01-05    nan   362.8\n",
       "5   YHOO   nan 2018-01-05    nan   858.2\n",
       "6   GOOG   nan 2018-01-05    nan   629.8"
      ]
     },
     "execution_count": 167,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(prc, volume, on=cols, how='outer')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第八节课 Concatenation/Binding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "metadata": {},
   "outputs": [],
   "source": [
    "import io\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "pd.set_option('precision', 2)\n",
    "prc = pd.read_csv(\n",
    "    io.StringIO('ticker,open,date,close\\nAAPL,426.23,2018-01-04,435.23\\nMSFT,42.3,2018-01-04,51.3\\nAAPL,436.23,2018-01-05,\\nMSFT,52.3,2018-01-05,\\n'),\n",
    "    parse_dates=['date']\n",
    ")\n",
    "prc2 = prc.assign(\n",
    "    date=pd.to_datetime('2018-01-06'),\n",
    "    close=prc.open + np.random.randn(len(prc.open))\n",
    ").drop('open', axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "metadata": {},
   "outputs": [],
   "source": [
    "volume = pd.DataFrame({\n",
    "    'ticker': ['AAPL', 'MSFT', 'IBM', 'YHOO', 'GOOG'],\n",
    "    'volume': [1954.73,  335.83,  362.79,  858.18,  629.79]\n",
    "}).assign(date=pd.to_datetime('2018-01-05'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.通过append, 合并prc2 和prc, 保证index不重复\n",
    "\n",
    "### 2.应用 pd.concat, 合并prc和prc2：\n",
    "* 确保index不重复\n",
    "* 只保留在prc和prc2都出现的column，从新产生一个index\n",
    "* 结果包括MultiIndex，用prc或prc2表示每个dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>open</th>\n",
       "      <th>date</th>\n",
       "      <th>close</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>426.2</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>435.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>42.3</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>51.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>436.2</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>52.3</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>425.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>41.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>435.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>51.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker  open       date  close\n",
       "0   AAPL 426.2 2018-01-04  435.2\n",
       "1   MSFT  42.3 2018-01-04   51.3\n",
       "2   AAPL 436.2 2018-01-05    nan\n",
       "3   MSFT  52.3 2018-01-05    nan\n",
       "4   AAPL   nan 2018-01-06  425.6\n",
       "5   MSFT   nan 2018-01-06   41.9\n",
       "6   AAPL   nan 2018-01-06  435.6\n",
       "7   MSFT   nan 2018-01-06   51.9"
      ]
     },
     "execution_count": 175,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prc.append(prc2,ignore_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 177,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>open</th>\n",
       "      <th>date</th>\n",
       "      <th>close</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>426.2</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>435.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>42.3</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>51.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>436.2</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>52.3</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>425.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>41.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>435.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>51.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker  open       date  close\n",
       "0   AAPL 426.2 2018-01-04  435.2\n",
       "1   MSFT  42.3 2018-01-04   51.3\n",
       "2   AAPL 436.2 2018-01-05    nan\n",
       "3   MSFT  52.3 2018-01-05    nan\n",
       "4   AAPL   nan 2018-01-06  425.6\n",
       "5   MSFT   nan 2018-01-06   41.9\n",
       "6   AAPL   nan 2018-01-06  435.6\n",
       "7   MSFT   nan 2018-01-06   51.9"
      ]
     },
     "execution_count": 177,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([prc,prc2],ignore_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 180,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>date</th>\n",
       "      <th>close</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>435.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>51.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>425.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>41.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>435.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>51.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ticker       date  close\n",
       "0   AAPL 2018-01-04  435.2\n",
       "1   MSFT 2018-01-04   51.3\n",
       "2   AAPL 2018-01-05    nan\n",
       "3   MSFT 2018-01-05    nan\n",
       "4   AAPL 2018-01-06  425.6\n",
       "5   MSFT 2018-01-06   41.9\n",
       "6   AAPL 2018-01-06  435.6\n",
       "7   MSFT 2018-01-06   51.9"
      ]
     },
     "execution_count": 180,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([prc,prc2],ignore_index=True,join='inner')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 189,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>ticker</th>\n",
       "      <th>open</th>\n",
       "      <th>date</th>\n",
       "      <th>close</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">prc</th>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>426.2</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>435.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>42.3</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>51.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>436.2</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>52.3</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>nan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">prc2</th>\n",
       "      <th>0</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>425.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>41.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AAPL</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>435.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MSFT</td>\n",
       "      <td>nan</td>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>51.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       ticker  open       date  close\n",
       "prc  0   AAPL 426.2 2018-01-04  435.2\n",
       "     1   MSFT  42.3 2018-01-04   51.3\n",
       "     2   AAPL 436.2 2018-01-05    nan\n",
       "     3   MSFT  52.3 2018-01-05    nan\n",
       "prc2 0   AAPL   nan 2018-01-06  425.6\n",
       "     1   MSFT   nan 2018-01-06   41.9\n",
       "     2   AAPL   nan 2018-01-06  435.6\n",
       "     3   MSFT   nan 2018-01-06   51.9"
      ]
     },
     "execution_count": 189,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([prc,prc2],keys=['prc','prc2'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第九节课"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 197,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',\n",
    "                           'two'],\n",
    "                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],\n",
    "                   'baz': [1, 2, 3, 4, 5, 6],\n",
    "                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.通过pivot转变数据格式，以bar作为index，foo作为column，zoo 为value\n",
    "\n",
    "### 2.通过melt把数据格式恢复"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 201,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">zoo</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>foo</th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bar</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A</th>\n",
       "      <td>x</td>\n",
       "      <td>q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>y</td>\n",
       "      <td>w</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>z</td>\n",
       "      <td>t</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    zoo    \n",
       "foo one two\n",
       "bar        \n",
       "A     x   q\n",
       "B     y   w\n",
       "C     z   t"
      ]
     },
     "execution_count": 201,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pt=df.pivot(index='bar',columns='foo',values=['zoo'])\n",
    "pt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 205,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([('zoo', 'one'),\n",
       "            ('zoo', 'two')],\n",
       "           names=[None, 'foo'])"
      ]
     },
     "execution_count": 205,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pt.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 210,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>foo</th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bar</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A</th>\n",
       "      <td>x</td>\n",
       "      <td>q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>y</td>\n",
       "      <td>w</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>z</td>\n",
       "      <td>t</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "foo one two\n",
       "bar        \n",
       "A     x   q\n",
       "B     y   w\n",
       "C     z   t"
      ]
     },
     "execution_count": 210,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pt=pt.droplevel(0,axis=1)\n",
    "pt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 221,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>bar</th>\n",
       "      <th>foo</th>\n",
       "      <th>zoo</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A</td>\n",
       "      <td>one</td>\n",
       "      <td>x</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>B</td>\n",
       "      <td>one</td>\n",
       "      <td>y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C</td>\n",
       "      <td>one</td>\n",
       "      <td>z</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A</td>\n",
       "      <td>two</td>\n",
       "      <td>q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>B</td>\n",
       "      <td>two</td>\n",
       "      <td>w</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>C</td>\n",
       "      <td>two</td>\n",
       "      <td>t</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  bar  foo zoo\n",
       "0   A  one   x\n",
       "1   B  one   y\n",
       "2   C  one   z\n",
       "3   A  two   q\n",
       "4   B  two   w\n",
       "5   C  two   t"
      ]
     },
     "execution_count": 221,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pt.reset_index().melt(id_vars='bar',value_name='zoo')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第九节课 高级 Merging & Reshaping"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',\n",
    "                           'two'],\n",
    "                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],\n",
    "                   'baz': [1, 2, 3, 4, 5, 6],\n",
    "                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.对每一个foo,和 bar的组合，算出 baz 的平均值\n",
    "\n",
    "### 2.改变数据结构，以foo为index，bar为column，baz作为value\n",
    "\n",
    "3.改变数据结构，以bar为index，foo为column，baz作为value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 223,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "foo  bar\n",
       "one  A      1\n",
       "     B      2\n",
       "     C      3\n",
       "two  A      4\n",
       "     B      5\n",
       "     C      6\n",
       "Name: baz, dtype: int64"
      ]
     },
     "execution_count": 223,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(['foo','bar'])['baz'].mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 224,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">baz</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bar</th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>foo</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    baz      \n",
       "bar   A  B  C\n",
       "foo          \n",
       "one   1  2  3\n",
       "two   4  5  6"
      ]
     },
     "execution_count": 224,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index='foo',columns='bar',values=['baz'],aggfunc='mean')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 226,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">baz</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>foo</th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bar</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    baz    \n",
       "foo one two\n",
       "bar        \n",
       "A     1   4\n",
       "B     2   5\n",
       "C     3   6"
      ]
     },
     "execution_count": 226,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index='bar',columns='foo',values=['baz'],aggfunc='mean')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第十节课-1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 236,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp5_df = pd.read_csv(\n",
    "    'sp500.csv', usecols=['date', 'adj_close'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.把date列的变为时间类型\n",
    "\n",
    "2.通过date列，产生年，月，日 列\n",
    "\n",
    "3.对 date 加 5 分钟"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 237,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [],
   "source": [
    "df['date']=pd.to_datetime(df['date'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 238,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [],
   "source": [
    "df['year']=df['date'].dt.year\n",
    "df['month']=df['date'].dt.month\n",
    "df['day']=df['date'].dt.day"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 239,
   "metadata": {
    "tags": [
     "3"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   2015-12-29 00:05:00\n",
       "1   2015-12-30 00:05:00\n",
       "2   2015-12-31 00:05:00\n",
       "3   2016-01-04 00:05:00\n",
       "4   2015-12-29 00:05:00\n",
       "5   2015-12-30 00:05:00\n",
       "6   2015-12-31 00:05:00\n",
       "7   2016-01-04 00:05:00\n",
       "Name: date, dtype: datetime64[ns]"
      ]
     },
     "execution_count": 239,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import datetime\n",
    "df['date']+datetime.timedelta(minutes=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第十节课-2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 240,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp5_df = pd.read_csv(\n",
    "    'sp500.csv', usecols=['date', 'adj_close'], \n",
    "    parse_dates=['date'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.产生一个时间数据 January 1st, 1993 16:00 \n",
    "\n",
    "2.产生一个时间index:\n",
    "\n",
    "    从2010-1-1开始，连续5天.\n",
    "    \n",
    "    从2010-1-1到2010-1-15，所有的工作日\n",
    "\n",
    "3.通过sp5_df的adj_close产生一个series,date作为index,并且对index排序. 新series的名字为sp5\n",
    "\n",
    "4.用两种不同的方法选择1995,1,3\n",
    "\n",
    "5.选择sp5中所有1995年3月的数据\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 241,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('1993-01-01 16:00:00')"
      ]
     },
     "execution_count": 241,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Timestamp('1993-01-01 16:00')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 242,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatetimeIndex(['2010-01-01', '2010-01-04', '2010-01-05', '2010-01-06',\n",
       "               '2010-01-07', '2010-01-08', '2010-01-11', '2010-01-12',\n",
       "               '2010-01-13', '2010-01-14', '2010-01-15'],\n",
       "              dtype='datetime64[ns]', freq='B')"
      ]
     },
     "execution_count": 242,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.date_range(start='2010-01-01', periods=5, freq='D')\n",
    "\n",
    "pd.date_range(start='2010-01-01', end='2010-01-15', freq='B')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 251,
   "metadata": {
    "tags": [
     "3."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "date\n",
       "1950-01-03      16.7\n",
       "1950-01-04      16.9\n",
       "1950-01-05      16.9\n",
       "1950-01-06      17.0\n",
       "1950-01-09      17.1\n",
       "               ...  \n",
       "2016-04-06   2,066.7\n",
       "2016-04-07   2,041.9\n",
       "2016-04-08   2,047.6\n",
       "2016-04-11   2,042.0\n",
       "2016-04-12   2,061.7\n",
       "Name: adj_close, Length: 16676, dtype: float64"
      ]
     },
     "execution_count": 251,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sp5 = pd.Series(sp5_df.adj_close.values, index=sp5_df.date, \n",
    "    name='adj_close').sort_index()\n",
    "sp5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 244,
   "metadata": {
    "tags": [
     "4."
    ]
   },
   "outputs": [],
   "source": [
    "d1a = sp5['19950103']\n",
    "d1b = sp5['1995-01-03']\n",
    "d1c = sp5[datetime.datetime(1995, 1, 3)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 247,
   "metadata": {
    "tags": [
     "5."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "date\n",
       "1995-03-01   485.6\n",
       "1995-03-02   485.1\n",
       "1995-03-03   485.4\n",
       "1995-03-06   485.6\n",
       "1995-03-07   482.1\n",
       "1995-03-08   483.1\n",
       "1995-03-09   483.2\n",
       "1995-03-10   489.6\n",
       "1995-03-13   490.0\n",
       "1995-03-14   492.9\n",
       "1995-03-15   491.9\n",
       "1995-03-16   495.4\n",
       "1995-03-17   495.5\n",
       "1995-03-20   496.1\n",
       "1995-03-21   495.1\n",
       "1995-03-22   495.7\n",
       "1995-03-23   496.0\n",
       "1995-03-24   501.0\n",
       "1995-03-27   503.2\n",
       "1995-03-28   503.9\n",
       "1995-03-29   503.1\n",
       "1995-03-30   502.2\n",
       "1995-03-31   500.7\n",
       "Name: adj_close, dtype: float64"
      ]
     },
     "execution_count": 247,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sp5['1995-03']\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第十节课-3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 248,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp5_df = pd.read_csv(\n",
    "    'sp500.csv', usecols=['date', 'adj_close'], \n",
    "    parse_dates=['date'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.计算每一天adj_close相较于前一天变化的百分比\n",
    "\n",
    "2.用resample计算sp5_df的 adj_close 的月平均值\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 249,
   "metadata": {
    "tags": [
     "1."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        nan\n",
       "1       -0.0\n",
       "2        0.0\n",
       "3       -0.0\n",
       "4        0.0\n",
       "        ... \n",
       "16671    0.0\n",
       "16672   -0.0\n",
       "16673   -0.0\n",
       "16674   -0.0\n",
       "16675   -0.0\n",
       "Name: adj_close, Length: 16676, dtype: float64"
      ]
     },
     "execution_count": 249,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sp5_df['adj_close'] / sp5_df['adj_close'].shift(1) - 1\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 250,
   "metadata": {
    "tags": [
     "2."
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>adj_close</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1950-01-31</th>\n",
       "      <td>16.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1950-02-28</th>\n",
       "      <td>17.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1950-03-31</th>\n",
       "      <td>17.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1950-04-30</th>\n",
       "      <td>17.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1950-05-31</th>\n",
       "      <td>18.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015-12-31</th>\n",
       "      <td>2,054.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-01-31</th>\n",
       "      <td>1,918.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-02-29</th>\n",
       "      <td>1,904.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-03-31</th>\n",
       "      <td>2,022.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-04-30</th>\n",
       "      <td>2,055.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>796 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            adj_close\n",
       "date                 \n",
       "1950-01-31       16.9\n",
       "1950-02-28       17.2\n",
       "1950-03-31       17.3\n",
       "1950-04-30       17.8\n",
       "1950-05-31       18.4\n",
       "...               ...\n",
       "2015-12-31    2,054.1\n",
       "2016-01-31    1,918.6\n",
       "2016-02-29    1,904.4\n",
       "2016-03-31    2,022.0\n",
       "2016-04-30    2,055.5\n",
       "\n",
       "[796 rows x 1 columns]"
      ]
     },
     "execution_count": 250,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sp5_df.set_index('date').resample('M').mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
